Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A New Method to Improve Multi Font Farsi/Arabic Character Segmentation Results: Using Extra Classes of Some Character Combinations

Identifieur interne : 001129 ( Main/Exploration ); précédent : 001128; suivant : 001130

A New Method to Improve Multi Font Farsi/Arabic Character Segmentation Results: Using Extra Classes of Some Character Combinations

Auteurs : Mona Omidyeganeh [Iran] ; Reza Azmi [Iran] ; Kambiz Nayebi [Iran, États-Unis] ; Abbas Javadtalab [Iran, États-Unis]

Source :

RBID : ISTEX:F4C1B1B87FBA724DD4D54E8CF934ABE87D09F99D

Abstract

Abstract: A new segmentation algorithm for multifont Farsi/Arabic texts based on conditional labeling of up and down contours was presented in [1]. A preprocessing technique was used to adjust the local base line for each subword. Adaptive base line, up and down contours and their curvatures were used to improve the segmentation results. The algorithm segments 97% of 22236 characters in 18 fonts correctly. However, finding the best way to receive high performance in the multifont case is challengeable. Different characteristics of each font are the reason. Here we propose an idea to consider some extra classes in the recognition stage. The extra classes will be some parts of characters or the combination of 2 or more characters causing most of errors in segmentation stage. These extra classes will be determined statistically. We have used a learn document of 4820 characters for 4 fonts. Segmentation result improves from 96.7% to 99.64%.

Url:
DOI: 10.1007/978-3-540-69423-6_65


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A New Method to Improve Multi Font Farsi/Arabic Character Segmentation Results: Using Extra Classes of Some Character Combinations</title>
<author>
<name sortKey="Omidyeganeh, Mona" sort="Omidyeganeh, Mona" uniqKey="Omidyeganeh M" first="Mona" last="Omidyeganeh">Mona Omidyeganeh</name>
</author>
<author>
<name sortKey="Azmi, Reza" sort="Azmi, Reza" uniqKey="Azmi R" first="Reza" last="Azmi">Reza Azmi</name>
</author>
<author>
<name sortKey="Nayebi, Kambiz" sort="Nayebi, Kambiz" uniqKey="Nayebi K" first="Kambiz" last="Nayebi">Kambiz Nayebi</name>
</author>
<author>
<name sortKey="Javadtalab, Abbas" sort="Javadtalab, Abbas" uniqKey="Javadtalab A" first="Abbas" last="Javadtalab">Abbas Javadtalab</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:F4C1B1B87FBA724DD4D54E8CF934ABE87D09F99D</idno>
<date when="2006" year="2006">2006</date>
<idno type="doi">10.1007/978-3-540-69423-6_65</idno>
<idno type="url">https://api.istex.fr/document/F4C1B1B87FBA724DD4D54E8CF934ABE87D09F99D/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001B02</idno>
<idno type="wicri:Area/Istex/Curation">001992</idno>
<idno type="wicri:Area/Istex/Checkpoint">000B02</idno>
<idno type="wicri:doubleKey">0302-9743:2006:Omidyeganeh M:a:new:method</idno>
<idno type="wicri:Area/Main/Merge">001146</idno>
<idno type="wicri:Area/Main/Curation">001129</idno>
<idno type="wicri:Area/Main/Exploration">001129</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">A New Method to Improve Multi Font Farsi/Arabic Character Segmentation Results: Using Extra Classes of Some Character Combinations</title>
<author>
<name sortKey="Omidyeganeh, Mona" sort="Omidyeganeh, Mona" uniqKey="Omidyeganeh M" first="Mona" last="Omidyeganeh">Mona Omidyeganeh</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Iran</country>
<wicri:regionArea>Iran Telecommunication Research Center (ITRC), Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Iran</country>
</affiliation>
</author>
<author>
<name sortKey="Azmi, Reza" sort="Azmi, Reza" uniqKey="Azmi R" first="Reza" last="Azmi">Reza Azmi</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Iran</country>
<wicri:regionArea>Computer Dep., Azzahra University, Vanak, Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Iran</country>
</affiliation>
</author>
<author>
<name sortKey="Nayebi, Kambiz" sort="Nayebi, Kambiz" uniqKey="Nayebi K" first="Kambiz" last="Nayebi">Kambiz Nayebi</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Iran</country>
<wicri:regionArea>Electrical Eng. Dep., Sharif University, Teharan</wicri:regionArea>
<wicri:noRegion>Teharan</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author>
<name sortKey="Javadtalab, Abbas" sort="Javadtalab, Abbas" uniqKey="Javadtalab A" first="Abbas" last="Javadtalab">Abbas Javadtalab</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Iran</country>
<wicri:regionArea>Computer Eng. Dep., Sharif University, Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2006</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">F4C1B1B87FBA724DD4D54E8CF934ABE87D09F99D</idno>
<idno type="DOI">10.1007/978-3-540-69423-6_65</idno>
<idno type="ChapterID">65</idno>
<idno type="ChapterID">Chap65</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: A new segmentation algorithm for multifont Farsi/Arabic texts based on conditional labeling of up and down contours was presented in [1]. A preprocessing technique was used to adjust the local base line for each subword. Adaptive base line, up and down contours and their curvatures were used to improve the segmentation results. The algorithm segments 97% of 22236 characters in 18 fonts correctly. However, finding the best way to receive high performance in the multifont case is challengeable. Different characteristics of each font are the reason. Here we propose an idea to consider some extra classes in the recognition stage. The extra classes will be some parts of characters or the combination of 2 or more characters causing most of errors in segmentation stage. These extra classes will be determined statistically. We have used a learn document of 4820 characters for 4 fonts. Segmentation result improves from 96.7% to 99.64%.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Iran</li>
<li>États-Unis</li>
</country>
</list>
<tree>
<country name="Iran">
<noRegion>
<name sortKey="Omidyeganeh, Mona" sort="Omidyeganeh, Mona" uniqKey="Omidyeganeh M" first="Mona" last="Omidyeganeh">Mona Omidyeganeh</name>
</noRegion>
<name sortKey="Azmi, Reza" sort="Azmi, Reza" uniqKey="Azmi R" first="Reza" last="Azmi">Reza Azmi</name>
<name sortKey="Azmi, Reza" sort="Azmi, Reza" uniqKey="Azmi R" first="Reza" last="Azmi">Reza Azmi</name>
<name sortKey="Javadtalab, Abbas" sort="Javadtalab, Abbas" uniqKey="Javadtalab A" first="Abbas" last="Javadtalab">Abbas Javadtalab</name>
<name sortKey="Nayebi, Kambiz" sort="Nayebi, Kambiz" uniqKey="Nayebi K" first="Kambiz" last="Nayebi">Kambiz Nayebi</name>
<name sortKey="Omidyeganeh, Mona" sort="Omidyeganeh, Mona" uniqKey="Omidyeganeh M" first="Mona" last="Omidyeganeh">Mona Omidyeganeh</name>
</country>
<country name="États-Unis">
<noRegion>
<name sortKey="Nayebi, Kambiz" sort="Nayebi, Kambiz" uniqKey="Nayebi K" first="Kambiz" last="Nayebi">Kambiz Nayebi</name>
</noRegion>
<name sortKey="Javadtalab, Abbas" sort="Javadtalab, Abbas" uniqKey="Javadtalab A" first="Abbas" last="Javadtalab">Abbas Javadtalab</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001129 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001129 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:F4C1B1B87FBA724DD4D54E8CF934ABE87D09F99D
   |texte=   A New Method to Improve Multi Font Farsi/Arabic Character Segmentation Results: Using Extra Classes of Some Character Combinations
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024